Goto

Collaborating Authors

 supervised learning problem







Learning task-specific predictive models for scientific computing

arXiv.org Artificial Intelligence

We consider learning a predictive model to be subsequently used for a given downstream task (described by an algorithm) that requires access to the model evaluation. This task need not be prediction, and this situation is frequently encountered in machine-learning-augmented scientific computing. We show that this setting differs from classical supervised learning, and in general it cannot be solved by minimizing the mean square error of the model predictions as is frequently performed in the literature. Instead, we find that the maximum prediction error on the support of the downstream task algorithm can serve as an effective estimate for the subsequent task performance. With this insight, we formulate a task-specific supervised learning problem based on the given sampling measure, whose solution serves as a reliable surrogate model for the downstream task. Then, we discretize the empirical risk based on training data, and develop an iterative algorithm to solve the task-specific supervised learning problem. Three illustrative numerical examples on trajectory prediction, optimal control and minimum energy path computation demonstrate the effectiveness of the approach.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

The paper starts by introducing the background and showing the contributions. The authors then use the Zhang and Ando result showing that SSL reduces to an equivalent kernel-based supervised learning problem for the rest of the paper. In section 2 and appendix, they provide an error bound showing that minimising the spectral norm of the graph kernel matrix is a good way to improve generalisation. In section 3, they add a spectral norm regularization term to the Zhang and Ando formulation, and provide links between the error convergence and the Lovasz number of the data graph. Section 4 proposes a proximal solver, where the specificity is that it deals with the projection step approximately to handle the constraint of a cone-polytope intersection.


Review for NeurIPS paper: Functional Regularization for Representation Learning: A Unified Theoretical Perspective

Neural Information Processing Systems

Summary and Contributions: Post-rebuttal comments Thank you for the response. I am happy with the explanations and will increase my score, thus recommending the paper for acceptance. The paper provides a theoretical background for learning tasks that combine two steps: i) representation learning (e.g., via auto-encoders or self-supervised learning) and ii) supervised learning with instances represented via the features learned in step i). The assumption is that in addition to labelled examples the algorithm has access to unlabelled instances. The first step learns a representation function h(x) that belongs to some hypothesis space H.


More data speeds up training time in learning halfspaces over sparse vectors

Neural Information Processing Systems

The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a computational resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task?


Adaptive Online Learning Dylan J. Foster ∗ Alexander Rakhlin †

Neural Information Processing Systems

We propose a general framework for studying adaptive regret bounds in the online learning setting, subsuming model selection and data-dependent bounds. Given a data-or model-dependent bound we ask, "Does there exist some algorithm achieving this bound?" We show that modifications to recently introduced sequential complexity measures can be used to answer this question by providing sufficient conditions under which adaptive rates can be achieved. In particular each adaptive rate induces a set of so-called offset complexity measures, and obtaining small upper bounds on these quantities is sufficient to demonstrate achievability. A cornerstone of our analysis technique is the use of one-sided tail inequalities to bound suprema of offset random processes. Our framework recovers and improves a wide variety of adaptive bounds including quantile bounds, second order data-dependent bounds, and small loss bounds. In addition we derive a new type of adaptive bound for online linear optimization based on the spectral norm, as well as a new online PAC-Bayes theorem.